Summary: This tutorial demonstrates an example workflow using different FsLab libraries. The aim is to create a correlation network, finding a threshold for which to filter and visualizing the result.
Networks provide a mathematical representation of connections found everywhere, e.g. computers connected through the internet, friends connected by friendships or animals connected in the food web.
This mathematical representation allows for many different, but universal approaches for creating, manipulating and interrogating these networks for new information. E.g. the most important nodes (or vertices)
can be identified for different metrics or the most efficient connection between two nodes can be found.
One widely used kind of network in biology is the gene co-expression network. Here the nodes are genes and the edges (or links) between them are how similar their expression patterns are. One measure for
this similarity is the correlation between the expression patterns. This kind of network is often used for finding interesting candidates, by identifying genes which are highly connected with known genes of interest.
In this tutorial, a simple workflow will be presented for how to create and visualize a correlation network from experimental gene expression data. For this, 4 FsLab libraries will be used:
#r"nuget: FSharp.Data"#r"nuget: Deedle"#r"nuget: FSharp.Stats"#r"nuget: Cyjs.NET"#r"nuget: Plotly.NET, 2.0.0-preview.16"dofsi.AddPrinter(fun(printer:Deedle.Internal.IFsiFormattable)->"\n"+(printer.Format()))// The edge filtering method presented in this tutorial requires an Eigenvalue decomposition. // FSharp.Stats uses the one implemented in the LAPACK library. // To enable it just reference the lapack folder in the FSharp.Stats nuget package:FSharp.Stats.ServiceLocator.setEnvironmentPathVariable@"C:\Users\USERNAME\.nuget\packages\fsharp.stats\0.4.2\netlib_LAPACK"// FSharp.Stats.Algebra.LinearAlgebra.Service()
In this tutorial, an multi experiment ecoli gene expression dataset is used.
FSharp.Data and Deedle are used to load the data into the fsi.
openFSharp.DataopenDeedle// Load the data letrawData=Http.RequestString@"https://raw.githubusercontent.com/HLWeil/datasets/main/data/ecoliGeneExpression.tsv"// Create a deedle frame and index the rows with the values of the "Key" column.letrawFrame:Frame<string,string>=Frame.ReadCsvString(rawData,separators="\t")|>Frame.take500|>Frame.indexRows"Key"
Networks can be represented in many different ways. One representation which is computationally efficient in many approaches is the adjacency matrix.
Here every node is represented by an index and the strength of the connection between nodes is the value in the matrix at the position of their indices.
In our case, the nodes of our network are genes in Escherichia coli (a well studied bacterium). In a correlation network, the strength of this connection is the correlation.
The correlation between these genes is calculated over the expression of these genes over different experiments. For this we use the pearson correlation.
openFSharp.StatsopenPlotly.NET// Get the rows as a matrixletrows=rawFrame|>Frame.toJaggedArray|>Matrix.ofJaggedArray// Create a correlation network by computing the pearson correlation between every tow rowsletcorrelationNetwork=Correlation.Matrix.rowWisePearsonrows// Histogram over the correlations for visualizing the distributionletcorrelationHistogram=correlationNetwork|>Matrix.toJaggedArray|>Array.mapi(funia->a|>Array.indexed|>Array.choose(fun(j,v)->ifi=jthenNoneelseSomev))|>Array.concat|>Chart.Histogram
// Send the histogram to the browsercorrelationHistogram|>Chart.show
As can be seen, the correlation between the most genes is relatively weak. The correlations roughly follow a right skewed gaussian distribution. So in this dataset genes tend to be more likely to be correlated than anti-correlated.
Creating this correlation network is not the endproduct you want though, as everything is still connected with everything. Many useful algorithms, like module finding, can only distinguish between
whether an edge between two vertices exists or not, instead of taking into consideration the strength of the connection. Therefore, many questions you want the network to answer, require a selection step,
in which strong connections are kept and weak ones are discarded. This is called thresholding. For this different algorithms exist. Here we will use an algorithm based on Random Matrix Theory (RMT).
The basic idea behind this RMT approach is filtering the network until a modular state is reached. Modularity is a measure for how much nodes in a network form groups, where connections between same-group members is
stronger or more likely than between members of different groups. In general, biological networks are generally regarded as modular, as usually more simple parts (like proteins resulting from gene expression)
need to work closely together to form more complex functions (like photosynthesis).
Finding this threshold is a repetitive process shown above. For each threshold, the eigenvalues of the matrix are calculated, normalized and the spacing between these eigenvalues is calculated. For an evenly filled matrix, the
frequency of these spacings follows the Wigner's surmise (see left picture above). If a certain number of edges is filtered and an underlying modular structure is revealed, the spacings start following the Poisson distribution.
The algorithm searches the point where this switch from one distribution to the other is reached with a given accuracy (see right picture above).
// Calculate the critical threshold with an accuracy of 0.01letthreshold,_=Testing.RMT.compute0.90.010.05correlationNetwork
// Send the histogram to the browsercorrelationHistogramFiltered|>Chart.show
After filtering the edges according the critical threshold found using RMT, only the strongly correlated genes are regarded as linked. As the distribution of all correlations was slightly skewed to higher values, only few anti correlations meet the threshold.
Finally, the resulting network can be visualized. For this we use Cyjs.NET, an FsLab library which makes use of the Cytoscape.js network visualization tool.
Further information about styling the graphs can be found here.
openCyjs.NET// The styled vertices. The size is based on the degree of this vertex, so that more heavily connected nodes are emphasizedletcytoVertices=rawFrame.RowKeys|>Seq.toList|>List.indexed|>List.choose(fun(i,v)->letdegree=Matrix.getRowfilteredNetworki|>Seq.filter((<>)0.)|>Seq.lengthletstyling=[CyParam.labelv;CyParam.weight(sqrt(floatdegree)+1.|>(*)10.)]ifdegree>1thenSome(Elements.node(stringi)styling)elseNone)// Styled edgesletcytoEdges=letlen=filteredNetwork.Dimensions|>fst[fori=0tolen-1doforj=i+1tolen-1doletv=filteredNetwork.[i,j]ifv<>0.thenyieldi,j,v]|>List.mapi(funi(v1,v2,weight)->letstyling=[CyParam.weight(0.2*weight)]Elements.edge("e"+stringi)(stringv1)(stringv2)styling)// Resulting cytographletcytoGraph=CyGraph.initEmpty()|>CyGraph.withElementscytoVertices|>CyGraph.withElementscytoEdges|>CyGraph.withStyle"node"[CyParam.shape"circle"CyParam.content=.CyParam.labelCyParam.width=.CyParam.weightCyParam.height=.CyParam.weightCyParam.Text.Align.centerCyParam.Border.color"#A00975"CyParam.Border.width3]|>CyGraph.withStyle"edge"[CyParam.Line.color"#3D1244"]|>CyGraph.withLayout(Layout.initCose(Layout.LayoutOptions.Cose(NodeOverlap=400,ComponentSpacing=100)))
// Send the cytograph to the browsercytoGraph|>CyGraph.withSize(1300,1000)|>CyGraph.show
Multiple items val string : value:'T -> string <summary>Converts the argument to a string using <c>ToString</c>.</summary> <remarks>For standard integer and floating point values the and any type that implements <c>IFormattable</c><c>ToString</c> conversion uses <c>CultureInfo.InvariantCulture</c>. </remarks> <param name="value">The input value.</param> <returns>The converted string.</returns>
-------------------- type string = System.String <summary>An abbreviation for the CLI type <see cref="T:System.String" />.</summary> <category>Basic Types</category>
val take : count:int -> frame:Frame<'R,'C> -> Frame<'R,'C> (requires equality and equality)
val indexRows : column:'C -> frame:Frame<'R1,'C> -> Frame<'R2,'C> (requires equality and equality and equality)
static member FrameExtensions.Print : frame:Frame<'K,'V> -> unit (requires equality and equality) static member FrameExtensions.Print : frame:Frame<'K,'V> * printTypes:bool -> unit (requires equality and equality)
namespace Plotly
namespace Plotly.NET
val rows : Matrix<float>
Multiple items module Frame
from Deedle
-------------------- type Frame =
inherit DynamicObj
new : unit -> Frame
-------------------- type Frame<'TRowKey,'TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new : rowIndex:IIndex<'TRowKey> * columnIndex:IIndex<'TColumnKey> * data:IVector<IVector> * indexBuilder:IIndexBuilder * vectorBuilder:IVectorBuilder -> Frame<'TRowKey,'TColumnKey> + 1 overload
member AddColumn : column:'TColumnKey * series:seq<'V> -> unit + 3 overloads
member AggregateRowsBy : groupBy:seq<'TColumnKey> * aggBy:seq<'TColumnKey> * aggFunc:Func<Series<'TRowKey,'a>,'b> -> Frame<int,'TColumnKey>
member Clone : unit -> Frame<'TRowKey,'TColumnKey>
member ColumnApply : f:Func<Series<'TRowKey,'T>,ISeries<'TRowKey>> -> Frame<'TRowKey,'TColumnKey> + 1 overload
member DropColumn : column:'TColumnKey -> unit
...
val toJaggedArray : frame:Frame<'R,'C> -> float [] [] (requires equality and equality)
Multiple items module Matrix
from FSharp.Stats
-------------------- type Matrix<'T> =
| DenseRepr of DenseMatrix<'T>
| SparseRepr of SparseMatrix<'T>
interface IEnumerable
interface IEnumerable<'T>
interface IStructuralEquatable
interface IStructuralComparable
interface IComparable
override Equals : yobj:obj -> bool
override GetHashCode : unit -> int
member GetSlice : start1:int option * finish1:int option * start2:int option * finish2:int option -> Matrix<'T>
member PermuteColumns : p:permutation -> Matrix<'T>
member PermuteRows : p:permutation -> Matrix<'T>
...
val ofJaggedArray : xss:float [] [] -> Matrix<float> <summary>
returns a dense matrix with the inner arrays of the input jagged array as its rows
</summary>
val correlationNetwork : Matrix<float>
module Correlation
from FSharp.Stats <summary>
Contains correlation functions for different data types
</summary>
module Matrix
from FSharp.Stats.Correlation <summary>
Contains correlation functions optimized for matrices
</summary>
val rowWisePearson : m:Matrix<float> -> Matrix<float> <summary>
computes the rowwise pearson correlation matrix for the input matrix
</summary>
val correlationHistogram : GenericChart.GenericChart
val toJaggedArray : m:matrix -> float [] []
Multiple items module Array
from FSharp.Stats <summary>
Module to compute common statistical measure on array
</summary>
-------------------- module Array
from Microsoft.FSharp.Collections <summary>Contains operations for working with arrays.</summary> <remarks>
See also <a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/arrays">F# Language Guide - Arrays</a>.
</remarks>
val mapi : mapping:(int -> 'T -> 'U) -> array:'T [] -> 'U [] <summary>Builds a new array whose elements are the results of applying the given function
to each of the elements of the array. The integer index passed to the
function indicates the index of element being transformed.</summary> <param name="mapping">The function to transform elements and their indices.</param> <param name="array">The input array.</param> <returns>The array of transformed elements.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val i : int
val a : float []
val indexed : array:'T [] -> (int * 'T) [] <summary>Builds a new array whose elements are the corresponding elements of the input array
paired with the integer index (from 0) of each element.</summary> <param name="array">The input array.</param> <returns>The array of indexed elements.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val choose : chooser:('T -> 'U option) -> array:'T [] -> 'U [] <summary>Applies the given function to each element of the array. Returns
the array comprised of the results "x" for each element where
the function returns Some(x)</summary> <param name="chooser">The function to generate options from the elements.</param> <param name="array">The input array.</param> <returns>The array of results.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val j : int
val v : float
union case Option.None: Option<'T> <summary>The representation of "No value"</summary>
union case Option.Some: Value: 'T -> Option<'T> <summary>The representation of "Value of type 'T"</summary> <param name="Value">The input value.</param> <returns>An option representing the value.</returns>
val concat : arrays:seq<'T []> -> 'T [] <summary>Builds a new array that contains the elements of each of the given sequence of arrays.</summary> <param name="arrays">The input sequence of arrays.</param> <returns>The concatenation of the sequence of input arrays.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
module GenericChart
from Plotly.NET <summary>
Module to represent a GenericChart
</summary>
val toEmbeddedHTML : gChart:GenericChart.GenericChart -> string <summary>
Converts a GenericChart to it HTML representation and embeds it into a html page.
</summary>
namespace System
namespace System.IO
type File =
static member AppendAllLines : path: string * contents: IEnumerable<string> -> unit + 1 overload
static member AppendAllLinesAsync : path: string * contents: IEnumerable<string> * encoding: Encoding *?cancellationToken: CancellationToken -> Task + 1 overload
static member AppendAllText : path: string * contents: string -> unit + 1 overload
static member AppendAllTextAsync : path: string * contents: string * encoding: Encoding *?cancellationToken: CancellationToken -> Task + 1 overload
static member AppendText : path: string -> StreamWriter
static member Copy : sourceFileName: string * destFileName: string -> unit + 1 overload
static member Create : path: string -> FileStream + 2 overloads
static member CreateSymbolicLink : path: string * pathToTarget: string -> FileSystemInfo
static member CreateText : path: string -> StreamWriter
static member Decrypt : path: string -> unit
... <summary>Provides static methods for the creation, copying, deletion, moving, and opening of a single file, and aids in the creation of <see cref="T:System.IO.FileStream" /> objects.</summary>
val compute : bwQuantile:float -> accuracy:float -> sigCriterion:float -> m:Matrix<float> -> float * Testing.TestStatistics.ChiSquareStatistics <summary>
Computes the critical Threshold for which the NNSD of the matrix significantly abides from the Wigner-Surmise
bwQuantile uses % data to calculate a more robust histogram //0.9 0.01 0.05
</summary>
val thr : float
val filteredNetwork : Matrix<float>
val map : f:(float -> float) -> a:matrix -> Matrix<float> <summary>
Builds a new matrix whose elements are the result of row wise applying the given function on each element of a.
</summary>
val abs : value:'T -> 'T (requires member Abs) <summary>Absolute value of the given number.</summary> <param name="value">The input value.</param> <returns>The absolute value of the input.</returns>
val correlationHistogramFiltered : GenericChart.GenericChart
val collect : mapping:('T -> 'U []) -> array:'T [] -> 'U [] <summary>For each element of the array, applies the given function. Concatenates all the results and return the combined array.</summary> <param name="mapping">The function to create sub-arrays from the input array elements.</param> <param name="array">The input array.</param> <returns>The concatenation of the sub-arrays.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input array is null.</exception>
val id : x:'T -> 'T <summary>The identity function</summary> <param name="x">The input value.</param> <returns>The same value.</returns>
namespace Cyjs
namespace Cyjs.NET
val cytoVertices : Elements.Node list
property Frame.RowKeys: seq<string> with get
Multiple items module Seq
from Plotly.NET
-------------------- module Seq
from FSharp.Stats <summary>
Module to compute common statistical measure
</summary>
-------------------- module Seq
from Microsoft.FSharp.Collections <summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.seq`1" />.</summary>
val toList : source:seq<'T> -> 'T list <summary>Builds a list from the given collection.</summary> <param name="source">The input sequence.</param> <returns>The result list.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
Multiple items module List
from FSharp.Stats <summary>
Module to compute common statistical measure on list
</summary>
-------------------- module List
from Microsoft.FSharp.Collections <summary>Contains operations for working with values of type <see cref="T:Microsoft.FSharp.Collections.list`1" />.</summary> <namespacedoc><summary>Operations for collections such as lists, arrays, sets, maps and sequences. See also
<a href="https://docs.microsoft.com/dotnet/fsharp/language-reference/fsharp-collection-types">F# Collection Types</a> in the F# Language Guide.
</summary></namespacedoc>
-------------------- type List<'T> =
| ( [] )
| ( :: ) of Head: 'T * Tail: 'T list
interface IReadOnlyList<'T>
interface IReadOnlyCollection<'T>
interface IEnumerable
interface IEnumerable<'T>
member GetReverseIndex : rank:int * offset:int -> int
member GetSlice : startIndex:int option * endIndex:int option -> 'T list
static member Cons : head:'T * tail:'T list -> 'T list
member Head : 'T
member IsEmpty : bool
member Item : index:int -> 'T with get
... <summary>The type of immutable singly-linked lists.</summary> <remarks>Use the constructors <c>[]</c> and <c>::</c> (infix) to create values of this type, or
the notation <c>[1;2;3]</c>. Use the values in the <c>List</c> module to manipulate
values of this type, or pattern match against the values directly.
</remarks> <exclude />
val indexed : list:'T list -> (int * 'T) list <summary>Returns a new list whose elements are the corresponding elements
of the input list paired with the index (from 0) of each element.</summary> <param name="list">The input list.</param> <returns>The list of indexed elements.</returns>
val choose : chooser:('T -> 'U option) -> list:'T list -> 'U list <summary>Applies the given function to each element of the list. Returns
the list comprised of the results <c>x</c> for each element where
the function returns Some(x)</summary> <param name="chooser">The function to generate options from the elements.</param> <param name="list">The input list.</param> <returns>The list comprising the values selected from the chooser function.</returns>
val v : string
val degree : int
val getRow : a:matrix -> i:int -> RowVector<float> <summary>
Returns row of index i of matrix a
</summary>
val filter : predicate:('T -> bool) -> source:seq<'T> -> seq<'T> <summary>Returns a new collection containing only the elements of the collection
for which the given predicate returns "true". This is a synonym for Seq.where.</summary> <remarks>The returned sequence may be passed between threads safely. However,
individual IEnumerator values generated from the returned sequence should not be accessed concurrently.
Remember sequence is lazy, effects are delayed until it is enumerated.</remarks> <param name="predicate">A function to test whether each item in the input sequence should be included in the output.</param> <param name="source">The input sequence.</param> <returns>The result sequence.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val length : source:seq<'T> -> int <summary>Returns the length of the sequence</summary> <param name="source">The input sequence.</param> <returns>The length of the sequence.</returns> <exception cref="T:System.ArgumentNullException">Thrown when the input sequence is null.</exception>
val styling : CyParam.CyStyleParam list
module CyParam
from Cyjs.NET
val label : v:'a -> CyParam.CyStyleParam
val weight : v:'a -> CyParam.CyStyleParam
val sqrt : value:'T -> 'U (requires member Sqrt) <summary>Square root of the given number</summary> <param name="value">The input value.</param> <returns>The square root of the input.</returns>
Multiple items val float : value:'T -> float (requires member op_Explicit) <summary>Converts the argument to 64-bit float. This is a direct conversion for all
primitive numeric types. For strings, the input is converted using <c>Double.Parse()</c>
with InvariantCulture settings. Otherwise the operation requires an appropriate
static conversion method on the input type.</summary> <param name="value">The input value.</param> <returns>The converted float</returns>
-------------------- [<Struct>]
type float = System.Double <summary>An abbreviation for the CLI type <see cref="T:System.Double" />.</summary> <category>Basic Types</category>
-------------------- type float<'Measure> =
float <summary>The type of double-precision floating point numbers, annotated with a unit of measure.
The unit of measure is erased in compiled code and when values of this type
are analyzed using reflection. The type is representationally equivalent to
<see cref="T:System.Double" />.</summary> <category index="6">Basic Types with Units of Measure</category>
module Elements
from Cyjs.NET
val node : id:string -> dataAttributes:CyParam.CyStyleParam list -> Elements.Node
val cytoEdges : Elements.Edge list
val len : int
property Matrix.Dimensions: int * int with get <summary>
RowCount * ColumnCount
</summary>
val fst : tuple:('T1 * 'T2) -> 'T1 <summary>Return the first element of a tuple, <c>fst (a,b) = a</c>.</summary> <param name="tuple">The input tuple.</param> <returns>The first value.</returns>
val mapi : mapping:(int -> 'T -> 'U) -> list:'T list -> 'U list <summary>Builds a new collection whose elements are the results of applying the given function
to each of the elements of the collection. The integer index passed to the
function indicates the index (from 0) of element being transformed.</summary> <param name="mapping">The function to transform elements and their indices.</param> <param name="list">The input list.</param> <returns>The list of transformed elements.</returns>
val v1 : int
val v2 : int
val weight : float
val edge : id:string -> sourceId:string -> targetId:string -> dataAttributes:CyParam.CyStyleParam list -> Elements.Edge
val cytoGraph : CyGraph.CyGraph
module CyGraph
from Cyjs.NET
val initEmpty : unit -> CytoscapeModel.Cytoscape
val withElements : elems:seq<CytoscapeModel.Element> -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
val withStyle : selector:string -> cyStyles:seq<CyParam.CyStyleParam> -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
val shape : v:'a -> CyParam.CyStyleParam
val content : v:'a -> CyParam.CyStyleParam
val width : v:'a -> CyParam.CyStyleParam
val height : v:'a -> CyParam.CyStyleParam
module Text
from Cyjs.NET.CyParam
module Align
from Cyjs.NET.CyParam.Text
val center : CyParam.CyStyleParam
module Border
from Cyjs.NET.CyParam
val color : v:'a -> CyParam.CyStyleParam
module Line
from Cyjs.NET.CyParam
val withLayout : ly:Layout -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
Multiple items module Layout
from Cyjs.NET
-------------------- type Layout =
inherit DynamicObj
new : name:string -> Layout
member name : string <summary>
Layout type inherits from dynamic object
</summary>
-------------------- new : name:string -> Layout
val initCose : applyOption:(Layout -> Layout) -> Layout <summary>
initializes a layout of type "cose" applying the givin layout option function.
The cose (Compound Spring Embedder) layout uses a physics simulation to lay out graphs.
</summary>
Multiple items type LayoutOptions =
new : unit -> LayoutOptions
static member Cose : ?Refresh:int * ?BoundingBox:'a * ?NodeDimensionsIncludeLabels:bool * ?Randomize:bool * ?ComponentSpacing:int * ?NodeRepulsion:'b * ?NodeOverlap:int * ?IdealEdgeLength:'c * ?EdgeElasticity:'d * ?NestingFactor:float * ?Gravity:int * ?NumIter:int * ?InitialTemp:int * ?CoolingFactor:float * ?MinTemp:float -> ('L -> 'L) (requires 'L :> Layout)
static member Generic : ?Positions:'a0 * ?Zoom:'a1 * ?Pan:'a2 * ?Fit:bool * ?Padding:int * ?Animate:bool * ?AnimationDuration:int * ?AnimationEasing:'a3 * ?AnimateFilter:'a4 * ?AnimationThreshold:int * ?Ready:'a5 * ?Stop:'a6 * ?Transform:'a7 -> ('L -> 'L) (requires 'L :> Layout) <summary>
Functions provide the options of the Layout objects
</summary>
-------------------- new : unit -> Layout.LayoutOptions
val withSize : width:int * height:int -> cy:CyGraph.CyGraph -> CyGraph.CyGraph
Multiple items module HTML
from Cyjs.NET <summary>
HTML template for Cytoscape
</summary>
-------------------- module HTML
from Plotly.NET <summary>
HTML template for Plotly.js
</summary>
val toEmbeddedHTML : cy:CytoscapeModel.Cytoscape -> string <summary>
Converts a CyGraph to it HTML representation and embeds it into a html page.
</summary>